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Abstract 

Let G = (V, E) be a directed acyclic graph with two distinguished vertices s, t, and let F be a set of forbidden pairs of 
vertices. We say that a path in G is safe, if it contains at most one vertex from each pair \u, v} e F. Given G and F, 
the path avoiding forbidden pairs (PAFP) problem is to find a safe s—t path in G. 

We systematically study the complexity of different special cases of the PAFP problem defined by the mutual 
positions of fobidden pairs. Fix one topological ordering < of vertices; we say that pairs {u, v] and {x,y} are disjoint, 
if u < v < x < y, nested, if u < x < y < v, and halving, if u < x < v < y. 

The PAFP problem is known to be NP-hard in general or if no two pairs are disjoint; we prove that it remains 
NP-hard even when no two forbidden pairs are nested. On the other hand, if no two pairs are halving, the problem 
is known to be solvable in cubic time. We simplify and improve this result by showing an 0(M(n)) time algorithm, 
where M(n) is the time to multiply two nxn boolean matrices. 
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1. Introduction 

Let G = (V, E) be a directed graph with two distinguished vertices s, t e V and let F c V x V be a set of forbidden 
pairs of vertices. We say that a path n is safe, if it does not contain any forbidden pair, i.e., n contains at most one 
vertex from each pair [u, v} e F. Given G and F, the path avoiding forbidden pairs problem (henceforth PAFP) is to 
find a safe s—t path in G. In this paper, we study the complexity of different special cases of the problem on directed 
acyclic graphs. 

1.1. Motivation 

The PAFP problem was first studied by Krause et al. Q and Srimani and Sinha O motivated by designing test 
cases for automatic software testing and validation. We can represent a program as a directed graph where vertices 
represent segments of code and edges represent the flow of control from one code segment into another. The goal is to 
cover this graph with s-t paths corresponding to different test cases. However, not all paths correspond to executable 
sequences in the program. Therefore Krause et al. [1] introduced forbidden pairs which identify the mutually exclusive 
code segments and formulated the PAFP problem. Unfortunatelly, as shown by Gabow et al. [3 1, the problem is NP- 
hard even for directed acyclic graphs. 

A different motivation came from bioinformatics and the problem of peptide sequencing via tandem mass spec- 
trometry. Peptides are polymers which can be though of as strings over a 20 character alphabet of amino acids and 
the sequencing problem is to determine the amino acid sequence of a given peptide. To this end, many copies of the 
peptide are fragmented and the mass of the fragments is measured (very precisely) by mass spectrometer. The result 
of the experiment is a mass spectrum where each peak corresponds to mass of some prefix or some suffix of the amino 
acid sequence, or is a noise. The spectrum is then compared against a database of known fragment weights. 
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Chen et al. @) suggested the following formulation of the peptide sequencing problem: Let us create a spectrum 
graph with two vertices pi and Si for each peak Wj with weights wipi) = W; — 1 and = W - wi + 1, where W is the 
weight of the whole peptide. We add an edge from x to y if the difference between weights w(y) - w(x) equals the total 
mass of some known sequence of amino acids. Thus, paths in this graph correspond to amino acid sequences. Paths 
going through p, correspond to w, being a weight of some prefix and similarly, paths going through s, correspond to 
Wj being a weight of some suffix. (Paths going through neither p t nor s-, correspond to w-, being a noise.) However, Wj 
cannot be a preffix weight and a suffix weight at the same time, so [pi, s,} will form a forbidden pair for each i. This is 
a very special case of the PAFP problem in directed acyclic graphs where all the forbidden pairs are nested and Chen 
et al. @ showed that it is polynomially solvable. 

The PAFP problem on directed acyclic graphs also arose in a completely different application in bioinformatics - 
gene finding using RT-PCR tests 0. In this application, we have a so called splicing graph where vertices represent 
non-overlapping segments of the DNA sequence, length of a vertex is the number of nucleotides in this segment, 
and edge (u, v) indicates that segment v immediately follows segment u in some gene transcript. Thus, paths in this 
splicing graph correspond to putative genes. The problem is to identify the true genes with a help of information from 
RT-PCR experiments. 

Without going into biology details, let us define a (simplified) result of an RT-PCR experiment as a triple t - 
(u, v, £), where u, v e V are two vertices and i is the length of a product. Let n be a path going through u and v in the 
splicing graph; if the length of the u-v subpath is equal to I, we say that n explains test t, otherwise, it is inconsistent 
with test t. We can define a score of a path n with respect to a set of tests T as a sum of the scores of all of its vertices 
and edges, plus a bonus B for each explained test from T, and minus a penalty P for each inconsistent test. The gene 
finding with RT-PCR tests problem is to find an s-t path with the highest score in the given splicing graph G with a 
set of RT-PCR tests T. 

Note that if we set all lengths to an unattainable value, say — 1, and we set a high (infinite) penalty P for inconsistent 
tests, we basically get the PAFP problem. Thus, the PAFP problem is at the core of gene finding with RT-PCR tests 
and the latter problem inherits all NP-hardness results for the PAFP problem. On the positive side, we have shown 
in our previous work |5| that some polynomial solutions for special cases of the PAFP problem can be extended to 
pseudo-polynomial algorithms for the gene finding problem. 

1.2. Previous results 

As shown by Gabow et al. |3], the PAFP problem is NP-hard in general, but several special cases are polynomially 
solvable. Yinnone [6 1 studied the PAFP problem under skew symmetry conditions where for each two forbidden pairs 
{u, u'}, {v, v'} e F, if there is an edge from u to v, there is also an edge from v' to u'. He proved that under such 
conditions, the problem is polynomially equivalent to finding an augmenting path with respect to a given matching 
and thus polynomially solvable. 

For directed acyclic graphs, we have already mentioned that the nested case is solvable in polynomial time [4|; Kol- 
man and Pangrac [7 1 were able to devise a polynomial algorithm if the set of forbidden pairs has a well-parenthesized 
or a halving structure (see Preliminaries). 

Recently, approximability and parameterized complexity of the PAFP problem have been studied: We add 1 to the 
objective function to disallow a zero cost solutions - otherwise the problem is trivially inapproximable. Hajiaghayi 
et al. [8 1 showed that even then there is a constant c > such that minimizing 1 + the number of forbidden pairs on 
an s-t path is not c ■ n-approximable. Bodlaender et al. (9) studied the PAFP problem on undirected graphs. When 
parameterized by the vertex cover of G — (V,E), the problem is W[l]-hard (the proof also carries over to directed 
acyclic graphs). On the other hand, when parameterized by the vertex cover of H = (V, F) (where edges are forbidden 
pairs), the problem is fixed parameter tractable (FPT), but has no polynomial kernel unless NP c coNP/poly. The 
problem is also FPT when parameterized by the treewidth of G U H. 

1.3. Contributions and road map 

In this paper, we systematically study different special cases of the PAFP problem on directed acyclic graphs. In 
the next section, we introduce the different special cases based on mutual positions of forbidden pairs. In Section [3] 
we prove that the PAFP problem is NP-hard even if the set of forbidden pairs has ordered structure and in Sections!?] 
and [5] we improve upon the results of Chen et al. [4| and Kolman and Pangrac [7| for the nested, halving, and 
well-parenthesized forbidden pairs. 
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Table 1 : Complexity of the PAFP problem for its different special cases; n and m denote the number of vertices and edges of G, respectively; 0(n a> ) 
is the complexity of boolean matrix multiplication, a/ < 2.3727 1 1011 111 - 
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2. Preliminaries 

Let G — ( V, E) be a directed acyclic graph and let F be the set of forbidden pairs. As already noticed by Yinnone 
IS) and Kolman and Pangrac [7 1, we may assume that every vertex except for s and t belongs to exactly one forbidden 
pair, i.e., \J F = V - {s, t). This is simply because if vertex v does not belong to any forbidden pair, we can remove it 
and replace all 2-edge paths u, v, w by a direct edge (u, w). On the other hand, if v belongs to k > 1 forbidden pairs, 
we can replace it by a directed path of length k and move the ends of forbidden pairs to different vertices on this path. 

To define special cases of interest, we fix one topological ordering of vertices. We say that vertex u is before or 
precedes v, u < v, if u precedes v in this linear order. Let us denote the forbidden pairs {/;, f'.} for i = I, ... ,1c, where 
f < f. and f < fz< ■■■ < fk> i-e., we order them by position of the left member of the pair. 

We recognize three possible types of mutual position of pairs {u, v} and {x,y} (without loss of generality, let u < v, 



x < y, and u < x): disjoint (u, v < x,y; see Fig. 1(a) I, nested (u < x,y < v; see Fig. 1(b) i, and halving (u < x < v < y; 
see Fig. 1(c) I. All the special cases are obtained by restricting the set of forbidden pairs F to only certain types of 
mutual positions (see Table[T]i. This gives us 2 3 = 8 cases, from which these 6 classes are non-trivial and interesting: 



r i r i i r 

(a) disjoint pairs (b) nested pairs (c) halving pairs 

Figure 1: Different mutual positions of two forbidden pairs. 



1. general case - there are no constraints on the positions of pairs; 

2. overlapping structur^- every two forbidden pairs overlap (they may be nested or halving, but not disjoint); as 
a consequence, fx < fi < ■ ■ ■ < fk < fLi) < f^ 2 ) < ••• < f'gm f° r some permutation a; 

3. ordered - there may be disjoint and halving pairs, but no two forbidden pairs are nested; as a consequence 

fi <&<•••< fk and f[ <%<■••< fv 

4. well-parenthesized - there may be disjoint and nested pairs, but no two pairs are halving; this case deserves its 
name since if we write (, and ), for the j'-th pair, we get a well-parenthesized sequence; 

5. halving - every two pairs halve each other; f <&<■■•< fk < f{ < fi < • • • ■< 

6. nested - there are only nested pairs i.e., the vertices in forbidden pairs are ordered f\ < fz< ■ ■■ < ft. < fi< 
■ ■ ■ < f' 2 < f[', this is a special case of the well-parenthesized case. 



'note that this special case is refered to as halving structure by Kolman and Pangrac Q; we reserve the term "halving" for sets where every two 
pairs halve each other 
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The previous work and our own results are summarized in Table [T] 

For completeness and as a warm-up, we include our own proof of NP-hardness of the PAFP problem in the general 
and overlapping case. This proof is also simpler than the one given by Kolman and Pangrac [7]. 

Theorem 1. The PAFP problem is NP-hard, even when the set of forbidden pairs has overlapping structure. 

Proof. By reduction from 3-SAT: Let <p = A i</<« *Pi be a formula over m variables x\ , . . . , x m , with n clauses <pj = 
v v ^1,3)) where each literal €u is either Xk or — ix^.. We will construct graph G and a set of forbidden pairs F 
such that there is an s-t path avoiding pairs in F if and only if <p is satisfiable. 




Figure 2: Input for the PAFP problem for the formula <p\ A (j>i A ■ ■ ■ A <f> n . All edges are directed from left to right. 

G consists of two parts: The first part contains a vertex for each variable x^ and its negation -ijc* (see Fig. EJ. A 
path traversing this first part corresponds to a truth assignment of variables where the visited vertices are true. The 
second part contains a vertex for each literal t-^j (see Fig. EJ. Forbidden pairs connecting every literal from the first 
part to every occurence of its negation in the second part of G will ensure that we can only go through "true" vertices. 
Thus an s-t path avoiding F exists if and only if every clause is satisfied. Since every forbidden pair starts in the first 
part and ends in the second part, all pairs overlap. □ 



3. Ordered forbidden pairs 

In this section, we turn to a seemingly more restricted version of the PAFP problem, allowing only disjoint and 
halving forbidden pairs. This special case has not been studied before. 

Theorem 2. The PAFP problem is NP-hard, even when the set of forbidden pairs is ordered. 

Proof. We will prove the claim by reduction from 3-SAT. Let be a logical formula over m variables x\, . . . ,x m , 
which is a conjunction of n clauses <p\ A ■ • ■ A (p„, where </>,- = (l u \ V V £y) and each literal l U j is either Xk or -ix^. 
We will construct graph G with a linear order -< on its vertices and an ordered set of forbidden pairs F such that there 
is an s-t path avoiding pairs in F if and only if <p is satisfiable. 



Graph G consists of several blocks B and B e of 2m vertices shown in Fig. 3(a) (b) The blocks are connected 



together as outlined in Fig. 3(c) Any left-to-right path through the block B naturally corresponds to a truth assignment 
of the variables and, since Be has an isolated vertex ->€, a path through block Bf corresponds to an assignment where 
i is true. A clause gadget consists of three such blocks, each corresponding to one literal. Any s-t path must pass 
through one of the three blocks, and thus choose an assignment that satisfies the clause. The forbidden pairs in F will 
enforce that the assignment of the variables is the same in all blocks. This is done by adding a forbidden pair between 
all literals (' in the B^-blocks with their counterparts -if in the previous and the following B-block. 

The order of literals in a B-block is -uci < x\ < -1*2 ■<•••-< x m , while the order in a Zfy-block is x\ < -iX\ < X2 < 
■ ■ ■ < -ix m . Let v'j < v' 2 < v' 3 < ■ ■ ■ be the order of vertices in graph G'. A zipping operation takes graphs G l ,G 2 ,G 3 
and produces a new graph G l U G 2 U G 3 with vertices ordered v\ < v 2 < v\ < v\< v\ < v\ < ■ ■ ■ . The clause gadgets 
are produced by zipping the three blocks corresponding to their literals. If we do not allow multiple forbidden pairs 



starting or ending in the same vertex, we can substitute vertices in G for short paths as in Fig. 3(d) It is easy to check 



that under such linear order, no two pairs in F are nested. □ 
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-iXl -^x 2 ~^x 3 ■ ■ ■ ->x m 

X\ X2 X3 ■■■ X m 

(a) Block B - vertices of this graph correspond to 
positive and negative literals; a path through this 
block corresponds to a truth assignment of the vari- 
ables. 



x\ x 2 x 3 ■■ ■ I x. 




—1X1 —*X2 ~^X 3 ■ ■ ■ ■ ■ ■ ^X m 

(b) Block B( is similar to a fi-block, except that the order of 
vertices is different and vertex -it is isolated. Thus, a path 
through B( corresponds to an assignment where t is true. 





(c) Construction of G from the blocks and zipped blocks corresponding to the clauses. Forbidden pairs enforce that 
the assignment of variables is the same in all blocks. 



forbidden pairs 




clause 4>k 

(d) An enlarged view of graph G showing block B, the following blocks for clause <j>t and the way they 
are connected by forbidden pairs. Note that no two forbidden pairs are nested. 



Figure 3: Construction of the graph G for a 3-SAT formula (p. All edges are directed from left to right. 



4. Well-parenthesized forbidden pairs 

The first polynomial algorithm for the PAFP problem with well-parenthesized forbidden pairs was given by Kol- 
man and Pangrac 0. Their algorithm uses three rules for reducing the input graph: 

1. contraction of a vertex - if v does not appear in any forbidden pair, remove it and add a direct edge (u, w) for 
every pair of edges (u, v), (v, w); 

2. removal of an edge - if edge e € E C\ F joins two vertices that make up a forbidden pair, remove e from E; 

3. removal of a forbidden pair - if (u, v) € F is a forbidden pair, but there is no path from u to v, remove (u, v) 
from F. 

These three rules are alternately applied to the input graph until we end up with vertices s and t only - either joined 
by an edge or disconnected - which is a trivial problem. 

A simple implementation of this approach gives an 0(n 2 m) algorithm. Using fast matrix multiplication, the time 
complexity can be reduced to 0{n" +l ) « 0(n 3 373 ) and using a dynamic data structure for "finding paths and deleting 
edges in directed acyclic graphs" by Italiano [ 12 1, it can be reduced still to 0(n y ). 

Here we describe our own 0(n 3 ) algorithm, its advantages being simplicity, extensibility, and improvability: The 
algorithm does not use any advanced data structures. It can be easily extended to solve problems such as 

• find an s-t path passing the minimum number of forbidden pairs or 
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• given a graph where all edges have scores and there are bonuses or penalties for some (well-parenthesized) pairs 
of vertices, find an s-t path with maximum score (this problem was considered by Kovac et al. [5 j). 

It seems unlikely that these problems can be solved using the former approach (because of rule 2). Furthermore, our 
algorithm can be improved using the Valiant's technique and fast matrix multiplication algorithms lfl3~l [PHI or the 
Four-Russians technique |fT5ll . Note that the reduction to matrix multiplication is not only of theoretical interest, since 
there are fast and practical hardware-based solutions for multiplying two matrices lfT6l[T7l . 

Theorem 3. The PAFP problem with well-parenthesized forbidden pairs can be solved in 0(n 3 ) time. 

Proof. We modify the input graph so that no two forbidden pairs start or end in the same vertex. Let P[u, v] be true if 
a safe u-v path exists and let J[u, v] be true if there is a forbidden pair (q,v) e F, u < q < v, and there is a safe u-v 
path such that the first edge jumps over q. 

The values of P and J can be found by dynamic programming: It is easy to compute J[u, v] (if we already know 
P[w, v] for all u < w < v) by inspecting the neighbours of u and conversely, we can also compute P[u, v] efficiently 
using the table J: If no forbidden pair ends in v or vertex u is "inside" the forbidden pair (q, v) e F, we just search 
the neighbours of v for a vertex that could be penultimate on the u-v path. Otherwise, let (q, v) e F be a forbidden 
pair such that u < q < v. Suppose that a safe u-v path exists and let w be the last vertex on this path before q. Then 
P[u, w] and J[w, v] are both true. Conversely, if P[u, w] and J[w, v] are true for some w < q, by concatenating the 
corresponding paths, we get a safe u-v path: The path obviously avoids all forbidden pairs before or after q (from the 
definition of P[u, w] and J[w, v]) and there are no forbidden pairs halving (q, v). 

Thus, P[s, t] can be computed in cubic time using the following two recurrences: 



J[u, v] = 



V(u,w)eE,q<w P\- w ' v ] if u < q and (q, v) e F is a forbidden pair (1) 

undefined otherwise 



P[u,v] 



true if u = v 

false if (u, v) e F is a forbidden pair 

V«<»<v, ( W ,v)eE Pl u > w ] if no forbidden pair ends in v or (q, v) € F for q < u (2) 

V u < w<q (P[u, w] A J[w, v]) if (q, v) is a forbidden pair, u < q < v (3) 

□ 

This algorithm can be further improved to 0(n u ) time by using fast boolean matrix multiplication. The proof 
is actually simple thanks to the work of Zakov et al. [14| that simplified and generalized the Valiant's technique 
ifTSl . They introduce a generic problem called Inside Vector Multiplication Template (VMT) which can be solved in 
subcubic time. A problem is considered an Inside VMT problem if it fulfills the following requirements: 

1 . The goal of the problem is to compute for every i, j a series of inside properties y8? j,0fj, ■ ■ ■ ,Pfj- 

2. Let 1 < k < K, and let . be a result of a vector multiplication of the form p^ . = (J) ^ ^ (f3 k l (j ®P k q "j)i f° r 
some 1 < k',k" < K. Assume that the following values are available: jj^., all values /J* ., for 1 < W < K and 
(f> f) S 0*> j) an d a ll values ffj for 1 < k! < k. Then, 0\ . can be computed in o(n) time. 

3. In the multiplication variant that is used for computing the © operation is associative, and the domain of 
elements contains a zero element. In addition, there is a matrix multiplication algorithm for this multiplication 
variant, whose running time M(n) over two n x n matrices satisfies M(ri) = o(n 3 ). 

Theorem 4 (Zakov et al. II14D . For every Inside VMT problem there is an algorithm whose running time is o(n 3 ). 
In particular, let M(n) be the complexity of the matrix multiplication used and suppose that p!, can be computed in 
0(1) time in item 2 of the definition above. Then the time complexity is 0(M(n) log n), if M(n) = 0(n 2 log* n); and 
0(M(n)), if M{n) = Q(n 2+e )/or s > 2 and 4M(n/2) < d ■ M(n) for some d < 1 and sufficiently large n. 

Corollary 1. The PAFP problem with well-parenthesized forbidden pairs can be solved in 0(n u ) time, where 2 < a> < 
2.3727 is the exponent in the complexity of the boolean matrix multiplication. 
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Proof. We formulate our solution from Theorem[3]as an Inside VMT problem. The goal is to compute inside proper- 
ties A, J, a,/3, P,P*', and P'^ . Properties J u v and P u >v correspond to the dynamic programming tables from the proof 
of Theorem|3] other properties are auxilliary. Property A is the adjacency matrix of graph G and it is constant (A BV = 1 
if and only if (u, v) e E). Properties a,/3 are used to store the partial results from cases (2) and (3) in the computation 
of P[u, v]. Finally, the auxiliary properties P^' and P'^ can be computed from P in constant time and are defined as 
follows: 

p l'v = p *,v A (q < w) if (q, v) e F, else false P'[ r = P M , v A (w < if (q, v) e F, else false 
Now we can rewrite the computation of J u v and P u v using boolean vector multiplication as follows: 



V \u,w)sE,q<w P bv< y ] 




J u,v 






V 'u<w<v, (w,v)eE P[u,w] 




a u ,v 


= ©W6(«,V)^ B ' W C 


9A WjV ) 


y u< w < q (p[u,w] aj[w,v]) 




Pu.v 


= ©w€(«,v)^"'.v 5 


5 / W ,y) 



Property can be computed from a uy and in constant time. □ 



5. The other cases and concluding remarks 

Note that the 0(n a> ) algorithm for well-parenthesized forbidden pairs also improves upon the result by Chen et al. 
|4| for the nested case. It remains an open problem whether there is a more efficient algorithm for the nested case. 

An O(n 0>+1 ) time algorithm for halving forbidden pairs is achieved by a refined version of the algorithm given by 
Kolman and Pangrac |7 1. Recall that in this case, the input graph G consists of two parts: all the forbidden pairs start in 
the first part, and end in the second part in the same order. Let us denote the vertices in the first part s < x\ < ■ ■ ■ < x„ 
and vertices in the second part y\ < ■ ■ ■ < y„ < t, where {x,,y,} are forbidden pairs. We may assume that all vertices 
are accessible from s and that t is accessible from every vertex. 

If there is a direct edge from s to the second part or if there is an edge from the first part to f, a safe s-t path exists 
trivially. Otherwise, we reduce the halving case to n instances of the nested case. There will be a safe s-t path in G if 
and only if there is a safe s-t' path in at least one of the produced instances. 

First, remove all the (Xi,yf) edges, add a new terminal vertex f', and reverse the direction of all edges in the second 
part of G. Note that in this new order, s < x\ < ■ ■ ■ < x n < t < y„ < ■ ■ ■ < y\ < t' , the forbidden pairs are nested. 
The k-t\\ instance is obtained by adding edges (x^, t) and (ye,t') for each edge (xk,ye), so there is a safe s-t path 
s, . . . , Xk, ye , ■ ■ ■ , t in the original graph G if and only if there is a safe s-t' path s,...Xk,t,...,yt,f in the new graph. 

It remains an open problem whether a more efficient algorithm exists. 
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